PCA Analysis

do a PCA with holopheno

Import libraries and load data

import holopheno
import pandas as pd
from palmerpenguins import load_penguins


penguins = load_penguins()
x_columns = ['species', 'island', 'sex']
y_columns = ['bill_length_mm', 
                   'bill_depth_mm', 
                   'flipper_length_mm', 
                   'body_mass_g', ]
penguins_h = holopheno.read_data(penguins, x = x_columns, y = y_columns);
Data info: 

sample_size 333
unique species values ['Adelie' 'Gentoo' 'Chinstrap']
unique island values ['Torgersen' 'Biscoe' 'Dream']
unique sex values ['male' 'female']

You can use holopheno to easily perform a PCA on the scaled_data

.dim_red_by_pca() performs a principle component analysis on the scaled data and you can check the explained variance

pca, f = penguins_h.dim_red_by_pca()

.transform_with_pca(pca) projects data onto the PC space

pca = penguins_h.dim_red_by_pca(n_components = 3, plot_variance_explained= False)
penguins_h.transform_with_pca(pca)

now you can easily visualize the original data in the PC space

# %matplotlib inline
palette = {'Adelie': 'orangered', 'Gentoo': 'steelblue', 'Chinstrap': 'seagreen'}
f_PC_3d = penguins_h.scatter_3d(['PC1', 'PC2', 'PC3'], color_by = 'species', palette = palette, type = 'scaled')